Analysis files available here for download:

https://github.com/ljcohen/ljcohen.github.io/tree/master/analyses/SLR

Table of Contents:

  1. Experiment Description
  2. Lab Protocols
  3. Data Collection
  4. Raw Data QC
  5. PCA Results Figures
  6. Transcript-level expression for each chip
  7. Results Files
  8. Venn Diagram Results Figures
  9. Comments
  10. Contact
  11. References

1. Experiment Description

This experiment consisted of hard coral samples, Orbicella faveolata collected from 3 sites (South, Ledge, Central) on the St. Lucie Reef on 3 different dates (June 2012, Oct 2012, Jan/Feb 2013). At least 4 samples at each site were collected from separate colonies for a total of 44 samples.

The aim was to see if gene expression patterns were different in the samples based on spatial or temporal differences between samples.

2. Lab Protocols

Coral samples were immediately preserved in the field in a guanidinium thiocyanate-phenol nucleic acid preservative, transported back to the lab on ice and stored at -80ºC. In a clean RNase free lab setting, phenol-chloroform extractions were performed on all samples. Aliquots were removed and chloroform added at a ratio of 1:5. RNA was separated in the aqueous layer, precipitated with ethanol and purified with LiCl. A NanoDrop® 1000 spectrophotometer (Thermo Fisher Scientific, Inc., MA USA) was used to quantify total RNA. Label-IT© (Mirus Bio LLC, Wisconsin, USA) nucleic acid labeling kit (MIR 3700) was used to covalently attach Cy™ 5 (excitation 650 nm and emission 670 nm) cyanine fluorescent dye at a frequency of one label about every 20-60 base pairs to 5µg of each sample. Fluorescent dye labeling allows detection if binding occurs between complimentary RNA fragments in the sample and known oligonucleotides spotted on the microarray. Labeling was done according to the standard manufacturer’s protocol (Lit# ML002, Rev 11/15/06) and quantified on a NanoDrop® 1000 spectrophotometer, wavelengths 225-750 nm. All Cy™ 5-labeled total RNA samples were hybridized to custom anti-sense 4x2K oligonucleotide CustomArrays™ using methods described in the hybridization and imaging protocol (Ecogenomics, Inc., Japan). Total RNA samples were fragmented with 10X fragmentation reagent (buffered zinc solution) and stop solution (200 mM EDTA, pH 8.0) (Catalog #AM8740, Ambion®, Applied Biosystems, Texas, USA).

3. Data Collection

Microarray chips were imaged at 635 nm using a high-resolution laser GenePix® 4200A microarray scanner (Axon, Molecular Devices, California, USA). Data were extracted from images using GenePixPro 6.0 software as .gpr files.

##          
##           JanFeb13 Jun12 Oct12
##   Central        4     4     0
##   Ledge          6     5     6
##   South          9     6     4
##      Chip.Number                             File.Name  Hyb.Date Colony    Site Array     Date             Var
## 1_4            1       63600_2013-05-24_250_array4.gpr 5/24/2013      2   South     4    Oct12      SouthOct12
## 1_3            1       63600_2013-05-24_300_array3.gpr 5/24/2013      5   Ledge     3    Jun12      LedgeJun12
## 1_2            1    63600_2013-05-24_370_arrays1,2.gpr 5/24/2013      5   South     2    Jun12      SouthJun12
## 1_1            1       63600_2013-05-24_250_array1.gpr 5/24/2013      5   Ledge     1 JanFeb13   LedgeJanFeb13
## 2_4            2      63600_2013-12-05_0220_array4.gpr 12/5/2013      4 Central     4 JanFeb13 CentralJanFeb13
## 2_3            2      63600_2013-12-05_0220_array3.gpr 12/5/2013      5   South     3    Oct12      SouthOct12
## 2_2            2   63600_2013-12-05_0300_arrays1,2.gpr 12/5/2013      3   Ledge     2 JanFeb13   LedgeJanFeb13
## 2_1            2   63600_2013-12-05_0300_arrays1,2.gpr 12/5/2013      5   Ledge     1    Oct12      LedgeOct12
## 3_4            3      63601_2013-06-06_0240_array4.gpr  6/6/2013      8   Ledge     4    Oct12      LedgeOct12
## 3_3            3  63601_2013-06-06_300_arrays1,2,3.gpr  6/6/2013      4   South     3    Oct12      SouthOct12
## 3_2            3  63601_2013-06-06_300_arrays1,2,3.gpr  6/6/2013      2 Central     2 JanFeb13 CentralJanFeb13
## 3_1            3  63601_2013-06-06_300_arrays1,2,3.gpr  6/6/2013      6   Ledge     1    Oct12      LedgeOct12
## 4_4            4           63605_2013-06-11_array4.gpr 6/11/2013      2 Central     4    Jun12    CentralJun12
## 4_3            4           63605_2013-06-11_array3.gpr 6/11/2013      5   South     3    Jun12      SouthJun12
## 4_2            4    63605_2013-06-11_300_arrays1,2.gpr 6/11/2013      5   Ledge     2    Jun12      LedgeJun12
## 4_1            4    63605_2013-06-11_300_arrays1,2.gpr 6/11/2013      1   South     1    Jun12      SouthJun12
## 5_4            5       63606_2013-06-11_240_array4.gpr 6/11/2013      2   South     4 JanFeb13   SouthJanFeb13
## 5_3            5      63606_2013-06-11_0270_array3.gpr 6/11/2013      4   South     3 JanFeb13   SouthJanFeb13
## 5_2            5    63606_2013-06-11_300_arrays1,2.gpr 6/11/2013      6   Ledge     2 JanFeb13   LedgeJanFeb13
## 5_1            5    63606_2013-06-11_300_arrays1,2.gpr 6/11/2013      5   South     1 JanFeb13   SouthJanFeb13
## 6_4            6      63608_2013-12-05_0270_array4.gpr 12/5/2013      2   Ledge     4    Jun12      LedgeJun12
## 6_3            6      63608_2013-12-05_0270_array3.gpr 12/5/2013      1   South     3    Jun12      SouthJun12
## 6_2            6   63608_2013-12-05_0280_arrays1,2.gpr 12/5/2013      6 Central     2 JanFeb13 CentralJanFeb13
## 6_1            6   63608_2013-12-05_0280_arrays1,2.gpr 12/5/2013      8   Ledge     1 JanFeb13   LedgeJanFeb13
## 7_4            7           63710_2013-05-24_array4.gpr 5/24/2013      6 Central     4    Jun12    CentralJun12
## 7_3            7       63710_2013-05-24_300_array3.gpr 5/24/2013      2 Central     3    Jun12    CentralJun12
## 7_2            7    63710_2013-05-24_350_arrays1,2.gpr 5/24/2013      2   South     2    Jun12      SouthJun12
## 7_1            7    63710_2013-05-24_350_arrays1,2.gpr 5/24/2013      6   Ledge     1    Jun12      LedgeJun12
## 8_4            8       63768_2013-05-30_230_array4.gpr 5/30/2013      5   South     4 JanFeb13   SouthJanFeb13
## 8_3            8       63768_2013-05-30_250_array3.gpr 5/30/2013      6   Ledge     3 JanFeb13   LedgeJanFeb13
## 8_2            8    63768_2013-05-30_300_arrays1,2.gpr 5/30/2013      2   South     2 JanFeb13   SouthJanFeb13
## 8_1            8    63768_2013-05-30_300_arrays1,2.gpr 5/30/2013      4   South     1 JanFeb13   SouthJanFeb13
## 9_4            9    63769_2013-05-30_230_arrays3,4.gpr 5/30/2013      6   Ledge     4    Oct12      LedgeOct12
## 9_3            9    63769_2013-05-30_230_arrays3,4.gpr 5/30/2013      2 Central     3 JanFeb13 CentralJanFeb13
## 9_2            9    63769_2013-05-30_300_arrays1,2.gpr 5/30/2013      4   South     2    Oct12      SouthOct12
## 9_1            9    63769_2013-05-30_300_arrays1,2.gpr 5/30/2013      8   Ledge     1    Oct12      LedgeOct12
## 10_4          10         63770_2013-06-05_0220_all.gpr  6/5/2013      8   Ledge     4    Oct12      LedgeOct12
## 10_3          10         63770_2013-06-05_0220_all.gpr  6/5/2013      4   South     3    Jun12      SouthJun12
## 10_2          10         63770_2013-06-05_0220_all.gpr  6/5/2013      2 Central     2    Jun12    CentralJun12
## 10_1          10         63770_2013-06-05_0220_all.gpr  6/5/2013      6   Ledge     1    Jun12      LedgeJun12
## 11_4          11       63771_2013-06-06_220_array4.gpr  6/6/2013      2   South     4 JanFeb13   SouthJanFeb13
## 11_3          11 63771_2013-06-06_0250_arrays1,2,3.gpr  6/6/2013      4   South     3 JanFeb13   SouthJanFeb13
## 11_2          11 63771_2013-06-06_0250_arrays1,2,3.gpr  6/6/2013      6   Ledge     2 JanFeb13   LedgeJanFeb13
## 11_1          11 63771_2013-06-06_0250_arrays1,2,3.gpr  6/6/2013      5   South     1 JanFeb13   SouthJanFeb13

4. Raw Data QC:

Data were imported into the R statistical programming environment. The R-Bioconductor package, limma (Linear Models for Microarray Data, version 3.22.3, Ritchie et al. 2015; Smyth 2004) was used to assess the quality of the data and to analyze for differential expression. All data were shifted +1 to accomodate log2 transformation. These are the first 6 rows of shifted +1 raw microarray data (not log2 transformed):

##                      1_4 1_3 1_2 1_1 2_4 2_3 2_2 2_1 3_4 3_3 3_2 3_1 4_4 4_3 4_2 4_1 5_4 5_3 5_2 5_1 6_4 6_3 6_2 6_1 7_4 7_3 7_2 7_1 8_4 8_3 8_2 8_1 9_4 9_3 9_2 9_1 10_4 10_3 10_2 10_1 11_4 11_3 11_2 11_1
## DkC.G11._17_51        14  10  28   2  11   6  15  12   9  13   6   3  13   6   9   5  19  10   8  20  12  12  10   5   8  13  15   1  18  19  17   7  16   6  18   7   10    7    4    4    7    8    2    4
## Ac_28_65_99           62  45  52   2  43  15  47  30  33  41  15   5  36  15  26   7  78  34  29  21  39  32  30  12  55  60  72  24  61  62  66  22  61  27  73  20   34   23   15    9   26   41   13   11
## Hsp27_cont.B1._17_51  24  20  37   2  26   8  19  17  19  21  11   5  21   8  11   6  44  20  20  20  26  21  18   9  20  30  32  13  37  38  27  12  33  11  30  14   21   13    8    4   19   20    8    4
## SedC.E9._131_165      36  20  70   3  24  11  30  22  29  25  14   8  27  12  17  10  51  26  27  20  35  27  25  13  29  31  26  10  60  64  36  16  45  12  31  17   31   18    8    6   24   28   13    5
## Ac_24_265_299         23  13  13   1  21   7  11   8  18  17   6   5  22  12  14   3  30  15  13  17  25  15  20   6  19  18  18   6  28  27  17   7  20   7  19   7   14    8    5    3   10   12    4    4
## Ac_3_209_243          10  10  23   1   8   2   7   7   9  11   4   3   8   4  10   4  13  11  10  15   7   6   4   5   8  16  22   5  12   9   6   4  10   5  20   7    9    4    4    3    7    8    4    3

Data were log2 transformed then normalized with 2 methods: the internal limma function “normalizeBetweenArrays” and the base R function, “normalizeCyclicLoess” Loess transformation. See “HBOI_SLR_microarray_log2_norm_results.ppt” and 3 arrayQualityMetrics reports for further details on the quality assessment of the raw data and why the Loess normalization method was chosen to smooth the data.

http://ljcohen.github.io/analyses/SLR/array_quality_metrics/QAreport_notransform/index.html

http://ljcohen.github.io/analyses/SLR/array_quality_metrics/QAreport_log2transform/index.html

http://ljcohen.github.io/analyses/SLR/array_quality_metrics/QAreport_log2transform_Loess/index.html

http://ljcohen.github.io/analyses/SLR/array_quality_metrics/QAreport_log2transform_normBetweenArrays/index.html

http://ljcohen.github.io/analyses/SLR/array_quality_metrics/HBOI_SLR_microarray_log2_norm_results.pptx.pdf

Typically, when data are noisy like this, and grouped slightly by technical variables such as hyb date and chip, normalization and removal of outliers can help with smoothing the data to pull out biological variation. (See Section 6.3 Between-Array Normalization of limma user guide: http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf)

Internal control probes were removed. These probes are supplied by the manufacturer and contain plant (SpkCtrl: RCP1| AF168390_1144_1178) and bacterial (SpkCtrl: NAC1| AF198054_481_515) sequences. We did not spike-in control RNA into these samples. Since total mRNA from corals potentially contain these sequences, we do typically see binding to these probes despite the absence of spike-in and they are not accurate to use as controls.

5. PCA results

Principal component analysis (PCA) reduces all expression data for each sample down to one point and compares samples in unitless vector space. The farther apart the samples are, the greater the differences there are in the data. When doing a differential expression analysis, it is important to understand where the variability is between samples so results can be interpreted accordingly.

Variables such as chip, hyb date, site, sample collection date, and “var” (site and sample date combined) are color-coded in the PCA below to see if there are differences between groups. Chip and hyb date are technical groupings, so we do not expect to see these data clustered together. Site and sample collection date are both biological groupings. If these data appear to be grouped together, then we know differences in gene expression between samples are due to biological variation and not technical variation.

chip

Data from chip2 appear to group together on the left along PC1 (the primary source of experimental variation). We could try to remove this chip as a potential outlier.

plot of chunk unnamed-chunk-7

hyb date

plot of chunk unnamed-chunk-8

site

plot of chunk unnamed-chunk-9

date

plot of chunk unnamed-chunk-10

var

plot of chunk unnamed-chunk-11

Since the data do not appear to strongly cluster together based on the biolgical groupings, site and sample collection date we will see if removing data from chip 2 will cause the data to become more spread apart, with greater differences between biological groups.

chip

This time, we see that there are now several chips that group together along PC2 (the secondary source of variation in the experiment). We could try removing these later as well to see if the groupings will improve.

plot of chunk unnamed-chunk-12

hyb date

plot of chunk unnamed-chunk-13

site

plot of chunk unnamed-chunk-14

date

plot of chunk unnamed-chunk-15

var

plot of chunk unnamed-chunk-16

Here, we will try removing data from chip 10 to see if this will imrove the groupings.

chip

plot of chunk unnamed-chunk-17

hyb date

plot of chunk unnamed-chunk-18

site

plot of chunk unnamed-chunk-19

date

plot of chunk unnamed-chunk-20

var

plot of chunk unnamed-chunk-21

6. Transcript-level expression for each chip

Transcript-level expression variability for each array used in the experiment. There are 150 transcripts represented on the microarray, with 15 spots per transcripts in 5 pieces (3 spots per piece). Box plots are mean, +- sd and min/max expression.

plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22 plot of chunk unnamed-chunk-22

Results

These data do not group strongly by the biological variables: site, date, or combined site-date. We will make comparisons anyway to see if there are differences.

Here we will create an interaction model, assuming a factorial design for the experiment. This is a 3 x 3 factorial design, even though there were no samples collected at the central site in Oct 2012. Differential expression models work by taking multiple pairwise comparisons, then looking at the overlap between the comparisons.

By Transcript, 150 transcripts total on the array

plot of chunk unnamed-chunk-24

plot of chunk unnamed-chunk-26

By probe, 726 probes on the array

Mean of each probe piece (5 pieces per transcript) to see if there are measureable expression differences between groups. There are 726 probes per array.

plot of chunk unnamed-chunk-28

plot of chunk unnamed-chunk-30

By Function, 29 functions on the array

For example, nucleic acid modification, growth & development, etc.

plot of chunk unnamed-chunk-32

plot of chunk unnamed-chunk-34

9. Comments

After looking over these results, let me know if you agree with the analysis methods or have any questions. Please feel free to suggest alternative methods.

10. Contact

Lisa Cohen, Ph.D. student MCIP, University of California, Davis ljcohen@ucdavis.edu 321-427-9335

11. References

Izarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003 Apr;4(2):249-64. http://www.ncbi.nlm.nih.gov/pubmed/12925520

Ritchie, ME, Phipson, B, Wu, D, Hu, Y, Law, CW, Shi, W, and Smyth, GK (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47. http://www.ncbi.nlm.nih.gov/pubmed/25605792

Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, Vol. 3, No. 1, Article 3. http://www.ncbi.nlm.nih.gov/pubmed/16646809 http://www.statsci.org/smyth/pubs/ebayes.pdf

R-Bioconductor: http://www.bioconductor.org/

limma: http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf